Search CORE

126 research outputs found

Beyond Folklore: A Scaling Calculus for the Design and Initialization of ReLU Networks

Author: Bottou Léon
Defazio Aaron
Publication venue
Publication date: 11/02/2021
Field of study

We propose a system for calculating a "scaling constant" for layers and weights of neural networks. We relate this scaling constant to two important quantities that relate to the optimizability of neural networks, and argue that a network that is "preconditioned" via scaling, in the sense that all weights have the same scaling constant, will be easier to train. This scaling calculus results in a number of consequences, among them the fact that the geometric mean of the fan-in and fan-out, rather than the fan-in, fan-out, or arithmetic mean, should be used for the initialization of the variance of weights in a neural network. Our system allows for the off-line design & engineering of ReLU neural networks, potentially replacing blind experimentation

arXiv.org e-Print Archive

Geometric clustering using the information bottleneck method

Author: Léon Bottou
Susanne Still
William Bialek
Publication venue: MIT Press
Publication date
Field of study

We argue that K–means and deterministic annealing algorithms for geometric clustering can be derived from the more general Information Bottleneck approach. If we cluster the identities of data points to preserve information about their location, the set of optimal solutions is massively degenerate. But if we treat the equations that define the optimal solution as an iterative algorithm, then a set of “smooth ” initial conditions selects solutions with the desired geometrical properties. In addition to conceptual unification, we argue that this approach can be more efficient and robust than classic algorithms.

CiteSeerX